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ABSTRACT 

The thymine analog 5-chlorouridine, first reported in 
the 1950s as anti-tumor agent, is known as an 
effective mutagen, clastogen and toxicant as well 
as an effective inducer of sister-chromatid 
exchange. Recently, the first microorganism with a 
chemically different genome was reported; the 
selected Escherichia coli strain relies on the four 
building blocks 5-chloro-2 -deoxyuridine (CIU), A, C 
and G instead of the standard T, A, C, G alphabet 
[Marliere.P., Patrouix.J., D6ring,V., Herdewijn, P., 
Tricot.S., Cruveiller,S., Bouzon.M. and Mutzel.R. 
(2011) Chemical evolution of a bacterium's 
genome. Angew. Chem. Int. Ed., 50, 7109-7114]. 
The residual fraction of T in the DNA of adapted 
bacteria was <2% and the switch from T to CIU 
was accompanied by a massive number of muta- 
tions, including >1500 A to G or G to A transitions 
in a culture. The former is most likely due to wobble 
base pairing between CIU and G, which may be 
more common for CIU than T. To identify potential 
changes in the geometries of base pairs and 
duplexes as a result of replacement of T by CIU, 
we determined four crystal structures of a B-form 
DNA dodecamer duplex containing CIU:A or CIU:G 
base pairs. The structures reveal nearly identical 
geometries of these pairs compared with T:A or 
T:G, respectively, and no consequences for stability 
and cleavage by an endonuclease (EcoRI). The lack 
of significant changes in the geometry of CIU:A and 
CIU:G base pairs relative to the corresponding 
native pairs is consistent with the sustained unlim- 
ited self-reproduction of E. coli strains with virtually 
complete T^CIU genome substitution. 



INTRODUCTION 

Uracil analogs with halogen substitution at the 5-position 
represent an important class of compounds with regard to 
their mutagenic activity (1). Such analogs were first 
synthesized in the 1950s as potential anti-tumor agents 
(2,3). The 5-fluorouracil (FU) analog is a well-known 
anti-cancer drug for treatment of human malignancies 
(4). 5-Chlorouracil and 5-bromouracil (CIU and BrU, 
respectively) are associated with inflammation and are 
considered to be carcinogenic (5). 5-Iodouracil (IU) was 
shown to have lethal and mutagenic effects on bacterio- 
phage T4 (6). 

Halogenated uracil residues are expected to exhibit base 
pairing properties in double-stranded nucleic acids that 
are closely related to those of thymine, thus involving 
complementary pairs with A or wobble pairs with G 
that are stabilized via Watson-Crick (W-C) hydrogen 
bonds. However, substitution at the 5-position of uracil 
can substantially alter the physical (electronic) and 
chemical properties of the nucleobase, as evidenced by 
changes in the UV spectra and the values of the pK a (7). 
The effects of incorporation of FU into DNA on the struc- 
ture and dynamics of the latter have been studied quite 
extensively, arguably as a result of the interesting pharma- 
cological properties of this U analog as an anti-cancer 
agent (8-10). Interest in other halouracil analogs was pri- 
marily focused on their use as radiosensitizing agents in 
human cancers. BrU and IU were shown to be more ef- 
fective in killing tumor cells using ionizing radiation 
(11,12). However, CIU, though an effective mutagen, 
clastogen and toxicant, as well as an effective inducer of 
sister-chromatid exchange, is not as sensitive to ionizing 
radiation as other thymine analogs (3). The relative lack of 
interest in CIU may explain the fact that a thorough in- 
vestigation of the impact of the replacement of T with CIU 
on the base pairing geometry and duplex conformation 
using structural tools is presently lacking. 
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Marliere et at. (13) recently evolved genomic DNA 
composed of the three canonical bases A, C and G and 
the artificial base C1U in an Escherichia coli 
strain lacking thymidylate synthase and requiring exogen- 
ous T. Selection over 25 weeks in a specially developed 
cultivation device yielded descendants that grew essen- 
tially with only C1U instead of T. The DNA of adapted 
bacteria contained 90% C1U and 10% T and this residual 
fraction was forced to <2% by disrupting the trmA gene 
for tRNA U54 methyltransferase. 

The above pioneering study prompted us to analyze the 
base pair geometries of C1U with A and G in more detail. 
To date, no crystallographic study of a DNA duplex 
comprising C1U:A or C1U:G base pairs has been 
reported. A nuclear magnetic resonance (NMR) investiga- 
tion by Theruvathu et at. (14) did not bring to light any 
substantial difference in the geometries of C1U:A and T:A 
base pairs. Also, the geometries of C1U:G and T:G wobble 
pairs exhibited similar geometries (15). In another study, 
the stacking patterns of a halogenated (F, CI or Br) 
uridine overhang at the 3'-terminus of an octamer 
RNA:DNA hybrid duplex were analyzed in presence of 
rhodium or iridium hexamine salt (16). The crystal struc- 
tures with C1U and BrU were similar in that the dangling 
ends were located atop the terminal base pair, whereas FU 
was ejected from the helical stack. 

To compare the geometries of C1U:A and C1U:G base 
pairs with those of the corresponding T:A and T:G pairs, 
respectively, and to examine possible effects of these 
artificial pairs on the conformation of duplex DNA, we 
determined crystal structures of four Dickerson-Drew 
Dodecamer (DDD) B-form duplexes containing 
two C1U:A pairs ([d(CGCGAA{ClU}TCGCG)] 2 , 
referred to as C1U7 here, and [d(CGCGAAT 
{C1U}CGCG)] 2 ; C1U8), four C1U:A pairs ([d(CGCGAA 
{C1UC1U}CGCG)] 2 ; C1U7/8), or two C1U:G base pairs 
([d(CGCGAATT{ClU}GCG)] 2 , C1U9) in complex with 
Bacillus halodurans RNase H (fi/zRNase H) at resolutions 
between 1.5 and 1.7 A. Crystals grown for the duplexes 
alone were not of diffraction-quality and we therefore 
resorted to using RNase H as a scaffold. DNA duplexes 
act as inhibitors of RNase H (17) and the enzyme binds 
double-stranded DNA non-specifically and without 
perturbing the structure of the central T:A region of the 
duplex (18), thus not compromising our geometric 
analysis of C1U:A or C1U:G pairs (the latter being 
adjacent to the four central T:A pairs). 

Our structures reveal nearly identical geometries of the 
C1U:A and C1U:G base pairs compared with T:A and 
T:G, respectively. In line with the structural similarities, 
UV melting experiments of DNA duplexes with C1U pairs 
uncovered only very minor consequences of the replace- 
ment of T by C1U opposite A or G relative to the parent 
duplexes with T:A or T:G pairs, respectively. These obser- 
vations regarding structure and stability were mirrored at 
the level of function. Thus, the restriction endonuclease 
EcoRI did not display a preference in its ability to recog- 
nize and cleave the natural recognition sequence G|AATT 
C ('I' marks the cleavage site) in the native DDD 
compared with duplexes featuring the modified recogni- 
tion sequences G|AA(C1U)TC or G|AAT(C1U)C. 



MATERIALS AND METHODS 

Protein expression and purification 

Bacillus halodurans genomic DNA was purchased from 
American Type Culture Collection (ATCC, Manassas, 
VA, USA). The Aspl32^Asn mutant of BhRNa.se H 
(Met58 to Lysl96) was expressed in E. coli and purified 
as described previously (18). The protein solution was 
concentrated to 25mg/ml. 

Synthesis of 5'-0-dimethoxytrityl-5-chloro-2'- 
deoxyuridine, 3'-[(2-cyanoethyI)-(A f ,A L diisopropyl)l- 
phosphoramidite and incorporation of C1U into 
oligonucleotides 

To a colorless solution of 5-chloro-2'-deoxyuridine (19) 
(600 mg, 2.28 mmol) in pyridine (20 ml), 4,4'- 
dimethoxytrityl chloride (930 mg, 2.75 mmol) was added 
in one portion at room temperature (RT). The reaction 
mixture was stirred for 12 h and turned to a yellow-orange 
color. After the starting material had disappeared, the 
reaction mixture was cooled in an ice bath, methanol 
(1 ml) was added and the reaction mixture was 
concentrated and co-evaporated twice with toluene. The 
residue was dissolved in dichloromethane, washed with 
H 2 0, dried over Na 2 S0 4 and purified by column chroma- 
tography on silica gel to yield 5'-0-dimethoxytrityl-5- 
chloro-2'-deoxyuridine (1.08 g, 84%). This compound 
(1.08 g, 1.91 mmol) was dissolved in dichloromethane (10 
ml) and cooled in an ice bath. /V~,iV-Diisopropylethylamine 
(1.5 ml, 8.76 mmol) and 2-cyanoethyl jV,/V-diisopropyl- 
chlorophosphoramidite (0.58 ml, 2.6 mmol) were added. 
The reaction solution was stirred for 30min at RT. 
Upon completion, the reaction mixture was concentrated 
and co-evaporated twice with toluene. The crude material 
was purified by column chromatography on silica to 
yield 5'-0-dimethoxytrityl-5-chloro-2'-deoxyuridine, 3'- 
[(2-cyanoethyl)-(A^/V-diisopropyl)]-phosphoramidite ( 1 .0 g, 
68%). 31 P NMR (CDC1 3 , 25°C): 5 = 149.08, 148.7. High- 
resolution mass spectrometry (HRMS) calculated for 
C3 9 H4 6 C1N 4 0 8 P, [MH+] 765.2794, found 765.2820. 

All four ClU-modified, high-pressure liquid chromatog- 
raphy-purified DDD oligonucleotides were purchased 
from Trilink (San Diego, CA, USA). The DNAs were 
annealed and mixed with the protein at 1.2:1 molar ratio 
in the presence of 5 mM MgCl 2 . 

Crystallization and structure determination 

Crystallization experiments were performed by the sitting 
drop vapor diffusion technique at 4°C using a sparse 
matrix screen (Hampton Research, Aliso Viejo, CA, 
USA) (20). A quantity of 1 ul complex solution was 
mixed with 1 ul of reservoir solution and equilibrated 
against 40 ul reservoir wells. Crystals appeared in 
droplets containing 0.2 M magnesium acetate, 0.1 M 
sodium cacodylate (pH 6.5) and 20% (w/v) PEG 8000 
within 2-3 days. Crystals were mounted in nylon loops, 
cryo-protected in reservoir solution containing 20% 
glycerol and frozen in liquid nitrogen. Diffraction data 
were collected using either a Mar225 or Mar300 CCD 
detector on the 21 -ID F/D beam lines of the Life 
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Sciences Collaborative Access Team (LS-CAT) at the 
Advanced Photon Source, Argonne National Laboratory 
(Argonne, IL, USA). Data were integrated and scaled with 
the program HKL2000 (21). The structures were 
determined by the Molecular Replacement technique 
using the program MOLREP (22,23) and the BhRNase 
H structure with PDB ID code 3D0P (protein alone) as 
the search model. Initial refinement was carried out with 
the program REFMAC (24) and DNA duplexes were then 
gradually built into the electron density, 2-3 bp at a time, 
followed by further refinement. Manual rebuilding was 
performed with the program COOT (25). Water molecules 
and metal ions were added gradually and isotropic/ 
Translation Libration Screw-motion (TLS) refinement 
was continued with the program PHENIX (26). A 
summary of crystallographic parameters is provided in 
Table 1. Helical parameters were calculated with the 
program CURVES (27). Illustrations were generated 
with the program UCSF Chimera (28). 

UV thermal melting 

Melting of each oligonucleotide (~7.5uM duplex concen- 
tration) was performed in 10 mM sodium phosphate, pH 
7.4 and 0.1 mM EDTA in the presence of 150mM NaCl. 
Absorbance versus temperature profiles were measured 
with a Varian Cary 300 spectrophotometer at 260 nm 
wavelength and 1-cm path length at heating or cooling 
rates of 0.5°C/min. Melting temperatures were determined 
with Varian Cary UV T m Analysis software and are 
averages of the maxima of the first derivative of the 
95-point smoothed curves from heating and cooling 
experiments. The final T m values are based on six inde- 
pendent measurements. Hyperchromicities were determ- 
ined by calculating the difference of absorbance between 
high- and low-temperature baselines and dividing by the 
absorbance of the low-temperature baseline. 



Cleavage assays 

Cleavage assays with EcoRI were conducted following a 
previous protocol (29) with modifications. Briefly, 5 uM 
^P-labeled DNA was incubated at 37° C with EcoRI 
buffer and 20 U of EcoRI (high fidelity; New England 
Biolabs, Ipswich, MA, USA) in a volume of 10 ul, with 
final concentrations of 50 mM potassium acetate, 20 mM 
Tris-acetate, pH 7.5, 10 mM magnesium acetate and 
1 mM dithiothreitol. Reactions were started by adding 
EcoRI and were stopped by removing an aliquot of 1 ul 
reaction mixture at various times and adding it to 9 ul of 
20 mM EDTA (pH 9.0) in 95% (v/v) formamide, followed 
by 20% PAGE analysis. 

Coordinates 

Final coordinates and structure factors for all four 
dodecamers have been deposited in the Protein Data 
Bank (http://www.rcsb.org.) The PDB ID codes are 
4HUF (C1U7), 4HTU (C1U8), 4HUG (C1U7/8) and 
4HUE (C1U9). 



RESULTS 

Overall structures of the ClU-modifled DDD:RNase 
H complexes 

To gain a better understanding of the potential structural 
change due to the replacement of the thymine 5-methyl 
group by chlorine, we selected the DDD B-form DNA 
as a template and synthesized four modified DDDs with 
C1U either located opposite A (C1U7, C1U8, C1U7/8) or 
opposite G (C1U9). We used crystals of these modified 
DDDs in complex with fi/zRNase H to study the 
geometries of C1U:A and C1U:G pairs as the crystals of 
the oligonucleotides alone were not of sufficient quality 
for a detailed structural analysis. In the crystal of the 



Table 1. Crystal data, data collection parameters and structure refinement statistics 



Structure/duplex 


C1U8 


C1U9 




C1U7 


C1U7/8 


Space group 






P2 1 2 1 2 1 






Cell dimensions 












a (A) 


64.08 


64.36 




64.19 


63.83 


b (A) 


64.76 


64.81 




64.75 


64.64 


c(A) 


116.47 


116.62 




116.47 


116.29 


Data collection 












Wave length (A) 






0.97872 






Resolution (A) 


35.88-1.49 


33.28-1.56 




33.22-1.69 


43.23-1.64 


Outer shell (A) 


1.52-1.49 


1.59-1.56 




1.72-1.69 


1.67-1.64 


Unique reflections (outer shell) 


77 514 (3752) 


68 234 (3339) 




54 372 (2620) 


59446 (2888) 


Completeness (outer shell) (%) 


96.9 (95.1) 


97.3 (96.6) 




98.3 (97.3) 


99.7 (99.3) 


R-merge (outer shell) 


0.071 (0.824) 


0.082 (0.800) 




0.061 (0.675) 


0.072 (0.674) 


I/ct(I) (outer shell) 


27.3 (2.5) 


19.8 (2.1) 




29.8 (2.9) 


26.7 (3.0) 


Refinement 












Working set reflections 


74 662 


68154 




54 301 


59 370 


Test set reflections 


3759 


3453 




2766 


2996 


R-work/R-free 


0.196/0.223 


0.197/0.226 




0.202/0.237 


0.197/0.220 


No. of protein/DNA atoms 


2244/896 


2203/897 




2192/833 


2215/833 


No. H 2 0/Mg 2+ /ligands 


476/2/4 


523/2/4 




375/2/4 


341/2/4 


Average B-factors 


30.18 


33.43 




27.80 


27.96 


RMSD bonds (A) 


0.009 


0.009 




0.008 


0.008 


RMSD angles (°) 


1.4 


1.3 




1.3 


1.4 
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Figure 1. Overall structure of the _6/;RNase H:C1U8-DDD (C1U8) complex. The asymmetric unit contains two independent complexes, both con- 
sisting of a single RNase H molecule bound to a modified DDD duplex. Protein chains in all four structures are labeled a (complex 1) and b 
(complex 2) and duplex strands are labeled c and d (complex 1) and e and f (complex 2), with nucleotides numbered 1-12 (c/e) and 13-20 (d/f). 
(A) The duplex in the first complex (colored green) is fully resolved in the electron density map and is contacted by three RNase H molecules 
(colored in white, gray or dark gray, whereby symmetry mates are marked with hash symbol). (B) The duplex in the second complex (colored in 
beige) is contacted by two RNase H molecules (colored in pink or magenta) and four C:G pairs at one end of the duplex are not visible in the 
electron density map. (C) The superimposition of the two independent complexes illustrates that the ClU8-DDDs exhibit similar conformations and 
virtually identical orientations of the original RNase H molecule relative to the DNA duplex. C1U residues are labeled and chlorine atoms are 
highlighted as light green spheres, selected water molecules are shown as cyan spheres, RNase H amino acids interacting with the DNA are labeled 
and the so-called phosphate (P-) binding site is marked by an arrow. 



native DDD bound to RNaseH studied earlier (18), 
protein-DNA contacts in the complex were limited to 
the backbone of the C:G tracts (underlined) at both 
ends: CGCG AATT CGCG . Thus, we expected base 
pairing in the central GAATTC region where we 
replaced T opposite either A or G by C1U not to be 
affected by binding of RNase H (Figure 1). 

Crystals diffracted to resolutions between 1.5 and 1.7 A 
and belong to space group P2{1{1\ (Table 1). The struc- 
tures were solved by the molecular replacement technique 
using the protein portion of the complex between RNase 
H and the native DDD (18) as the search model. All four 
crystals of the complexes with ClU-modified DDDs are 
isomorphous (Table 1). However, unlike in the structure 
of the complex between RNase H and the native DDD 
where the duplex sits on a dyad and the asymmetric unit 
contained two protein molecules and two DNA single 
strands, crystals of the complexes with ClU-modified 
DDDs feature two independent complexes consisting of 
RNase H bound to a duplex (Figure 1A and B). The 
first of these displays clear electron density around all nu- 
cleotides, whereas in the second, the density around the 



DNA is only partially resolved and either three or four 
base pairs at one end of the duplex are missing in the 
individual complex crystals (Figure IB and C). Similarly, 
two (C1U7 crystal structure) and three N-terminal amino 
acids (C1U7/8 crystal structure) in the RNase H molecules 
from the second complex could not be resolved in the 
electron density maps. Examples of the quality of the 
final electron density are shown in Figure 2 and 
Supplementary Figures S1-S3. 

Inspection of the lattice interactions in the structures of 
the complexes reveals the origin of the diminished order of 
four base pairs in the second duplex. Three RNase H 
molecules interact with the duplex from the first 
complex, whereby two proteins cradle one end and a 
loop entailing residues T90, G91 and E92 of a third is 
stacked against the terminal base pair at the other end 
(Figure 1A). In contrast, the second duplex only exhibits 
interactions with RNase molecules at one end, with the 
other not being stabilized by protein contacts and jutting 
out into the solvent (Figure IB). The comparison between 
the orientations of RNase H molecules around the two 
independent duplexes reveals similar positions of the 
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Figure 2. Quality of the final electron density. Fourier (2F o -F 0 ) sum 
electron density drawn at the ~1.0 a threshold (A) around the central 
ApApTpClU tetramer in the C1U8 duplex, and (B) around base pair 
A5:C1U20 (top) and base pair C1U8:A17 (bottom). Atoms are colored 
beige, red, blue, orange and light green for carbon, oxygen, nitrogen, 
phosphorus and chlorine, respectively. 

protein that binds a phosphate group at its P-binding 
site but a considerable shift between the second protein 
molecules contacting the same end of the two duplexes 
(Figure 1C). 

Protein-DNA interactions in the ClU-DDD:RNase 
H complex structures 

The two complexes per asymmetric unit in the four crystal 
structures exhibit similar interactions between RNase H 
and the ClU-modified DDD (Figure 1A, white RNase H a 
and green duplex; Figure IB, pink RNase H b and beige 
duplex). In both cases, the phosphate group of C3 is 
lodged at the phosphate-binding pocket (Figure 3) that 
harbors a phosphate of a DNA nucleotide separated by 
2 bp from the scissile RNA phosphate in the structure of 
an RNA:DNA hybrid bound to 5/? RNase H (30). In the 
crystal structure of the complex between RNase H and 
the native DDD, the phosphate of the terminal G12 sits 
at the active site and thus mimics a phosphate from the 
RNA strand (18). The phosphate of residue G4 from the 
paired strand is anchored at the phosphate-binding 
pocket. Compared with these interaction modes between 
RNA:DNA hybrid and native DDD and RNase H, 
ClU-modified duplexes thus exhibit a less intimate inter- 
action with the protein as only phosphates from one 
strand are contacted by amino acids (P3 by T104, SI 47 
and T148 and P4 by W139; Figure 3). As a consequence, 
only one Mg 2+ ion (Mg B ) is observed at the protein active 
site, whereby the ion coordination sphere comprises D71, 
E109 and four water molecules (Figure 3). In addition to 
the protein-DNA contacts involving phosphate groups, 
four residues from a symmetry-related RNase H 
molecule interact with the terminal base pair of the first 
duplex. R126 and P177 stack onto the nucleobases of G24 
and CI, respectively, and 164 and LI 79 form hydrophobic 
interactions with the base and sugar moieties, respectively, 
of nucleotide CI (Figures 1A and 3). In the case of the 
second complex where the symmetry-related RNase H 
molecule adopts a somewhat different orientation 
(Figure 1A, gray RNase H b# versus Figure IB, 
magenta RNase H a#), amino acids establishing contacts 
with the terminal base pair include K143, N170 and T173 
(Figure IB). However, neither duplex 1 nor duplex 2 in the 
four crystal structures exhibits interactions between 



RNase H and A or C1U/T nucleotides from the central 
tetramer. 

Geometry of C1U:A and C1U:G base pairs 

A central question that we wanted to address with our 
structural studies is whether there are any changes in the 
pairing modes and/or geometries of C1U:A and C1U:G 
base pairs relative to the native T:A and T:G pairs, 
respectively. Analysis of the duplexes containing either 
two or four C1U:A pairs reveals that they are of the 
standard W-C type with formation of two hydrogen 
bonds. Comparison between the geometries of C1U:A 
and T:A pairs by superimposing the C1U7 and C1U8 
duplexes (Figure 4A) or the C1U7/8 and native DDD 
(31) duplexes (Figure 4C) demonstrates that replacing 
the methyl group of T by chlorine is of little consequence. 
Except for the methyl group carbon and chlorine, whose 
positions deviate somewhat as a result of the longer C-Cl 
bond relative to C-CH 3 , the C1U:A and T:A pairs neatly 
overlap. Similarly, like T:G the C1U:G pair adopts the 
familiar wobble geometry, with G and C1U being shifted 
into the minor and major grooves, respectively, under for- 
mation of two hydrogen bonds (Figure 4B). As with the 
C1U:A and T:A pairs, the superimposition of the C1U9 
duplex and a DDD featuring T:G pairs (32) illustrates 
the nearly identical geometries of the two pairs. 
Moreover, calculated geometric parameters for these 
duplexes (27), including rise, twist, x- and y-displacement, 
etc. (see the Supplementary Material) confirm that 
chlorine attached at the 5-position of thymine in place 
of a methyl group does not trigger a substantial difference 
in either pairing behavior or base pair geometry. 

The similarities observed at the level of individual base 
pairs extend to the overall conformations of the 
dodecamer duplexes. As mentioned above, RNase H mol- 
ecules in the crystal structures of the complexes with 
ClU-modified DDDs contact the duplexes exclusively in 
the CG portions, either at both ends (duplex 1, Figure 1A) 
or at only one end (duplex 2, Figure IB). As a result, a 
hallmark of the DDD, the narrow A-tract minor groove 
remains largely unaffected by the interactions with the 
protein (Figure 5). Introduction of T:G mismatches at 
the border of the A-tract results in a wider minor groove 
in the C1U9 and G:T DDDs at the location of the 
mismatch pairs. However, replacement of T by C1U has 
no obvious effect on the change in groove width. The 
depth of the minor groove along the entire dodecamer 
exhibits much less variation compared with the width 
and neither G:T mismatch pairs nor C1U:A/G pairs 
cause any significant changes relative to the native DDD 
(Figure 5). 

Thermodynamic stability of ClU-modified duplexes 

To examine whether replacement of T by C1U affects 
duplex stability, we carried out UV melting studies with 
the native DDD, a DDD with T:G mismatches and all 
four ClU-modified DDDs (Table 2). The melting tempera- 
tures for duplexes with two (C1U7, C1U8) or four C1U:A 
pairs (C1U7/8) vary only slightly and are similar to the 
T m of the native DDD. Insertion of T:G mismatches is 
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Figure 3. Interactions between RNase H and C1U modified DDDs. The original RNase H molecule (white ribbon) binds phosphates from one DNA 
strand; the phosphate of C3 (P3) is lodged at the phosphate-binding pocket and W139 forms a hydrogen bond to the phosphate of G4. The active 
site that normally harbors the scissile phosphate from the RNA strand (18) remains unoccupied and only one of the two Mg~ + ions (purple sphere) is 
present, with some water molecules (cyan spheres) taking the place of phosphate oxygens. Amino acid side chains from a symmetry-related RNase H 
molecule (164, L179, P177 and R126) interact with the terminal C1:G24 base pair. Hydrogen bonds and Mg 2+ coordination sphere are indicated with 
thin solid lines and two glycerol molecules trapped at the protein-DNA interface are highlighted with carbon atoms colored in black. 




Figure 4. Comparison between the geometries of C1U:A and C1U:G and those of T:A and T:G base pairs, respectively. (A) Superimposition of the 
d(ClU7pT8pC9):d(G16pA17pA18) (light blue carbon atoms) and d(T7pClU8pC9):d(G16pA17pA18) (gray carbon atoms) trimer portions from the 
C1U7 and C1U8 structures, respectively. (B) Superimposition of the d(T8pClU9pG10):d(C15pG16pA17) (gray carbon atoms) and 
d(T7pT8pC9):d(G16pA17pA18) (light blue carbon atoms) trimer portions from C1U9 and a G:T mismatch-containing DDD structure [PDB ID 
11 3D (32)], respectively. (C) Superimposition of the central hexamers from the C1U7/8 structure (gray carbon atoms) and the native DDD [PDB ID 
436D (31)] (light blue carbon atoms). Chlorine atoms are highlighted in green, hydrogen bonds are indicated with thin solid lines (omitted in panel C 
for clarity), and fluorine atoms of two 2'-deoxy-2'-fluoroarabino-Ts in the reference structure are highlighted in purple (panel C). 



accompanied by a steep drop in the stability that is similar 
for the duplex featuring two C1U:G pairs (C1U9). Overall, 
these data argue against an either stabilizing or 
destabilizing effect of the C1U nucleotide analog in 
B-form DNA. 



EcoRI cleavage assays with ClU-modified DDDs 

The EcoRI endonuclease recognizes the self-com- 
plementary hexamer 5'-G|AATTC-3':3'-CTTAA|G-5' 
and cleaves between G and A (|) under formation of 
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opportunity to replace it in vivo by starvation and exogen- 
ous introduction of unnatural alternatives (33). 
Incorporation of 5-halogenopyrimidines into DNA was 
established decades ago (34). Among the analogs with 
fluoro-, bromo-, chloro- or iodo-substituents, C1U 
exhibits the closest likeness to T (14), is readily converted 
to the nucleoside triphosphate in the cell (35) and lacks the 
photolabile behavior of BrU and IU (36). Starting from an 
E. coli strain lacking thymidylate synthase and furnishing 
exogenous C1U instead of T, 25 weeks of selection in a 
cultivation device yielded a strain with A, G, C and 90% 
C1U and 10% T in its genome (13). Through additional 
disruption of the tRNA U54 methyltransferase gene, the 
C1U content was further reduced to <2%. This T— >-ClU 
transliteration was accompanied by more than 1500 A to 
G and G to A transitions in a particular culture, whereby 
the former was about twice as common. The frequency of 
these transitions suggests that C1U is prone to mispairing 
with G, although it is not clear whether C1U:G resembles 
the T:G pair in the wobble configuration with two 
hydrogen bonds or whether the mismatch pair exhibits a 
different hydrogen bonding pattern as a consequence of 
subtle changes in pK a , dipole moment and/or hydration of 
the chlorouracil base compared with thymine (15). 

We selected the DDD B-form DNA as a template to 
analyze the base pairing behavior of C1U opposite either A 
or G by X-ray crystallography. Because crystals of the 
DNAs alone did not diffract to high resolution, we 
decided to determine the structures of their complexes 
with 5/zRNase H. We previously found that crystals of 
complexes between the endonuclease and the native 
DDD or chemically modified^ DDDs diffract X-rays to 
resolutions of around 1.5 A (18,29,37). Indeed, all 
crystals of complexes with ClU-modified DDDs diffracted 
to better than 1.7 A resolution (Table 1). Two of the 
complexes feature duplexes with two T:A pairs replaced 
by C1U A (C1U7 and C1U8) and a third contains a duplex 
with all four T:A pairs in the central A-tract of the DDD 
replaced by C1U:A (C1U7/8). A fourth complex is between 
fi/iRNase H and a DDD in which two C1U:G pairs 



Table 2. UV-melting temperatures of the native DDD, ClU-modified DDDs and a DDD with T:G mismatches 



Duplex 


DDD 


C1U7 


C1U8 


C1U7/8 


C1U9 


T:G DDD 


T m (°C) 


62.6 ± 0.9 


61.8 ± 1.2 


64.0 ± 0.5 


63.8 ± 0.6 


35.4 ± 0.7 


36.5 ± 0.5 


Hyperchromicity (%) 


15.5 


14.3 


15.7 


13.5 


15.6 


16.3 




CIU7/8 


CIU8 CIU7 


CIU9 


DDD T:G DDD 
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Figure 6. PAGE assay of EcoRI cleavage experiments with ClU-modified DDDs, the native DDD d(CGCGAATTCGCG) and the DDD 
d(CGCGAATTTGCG) with T:G mismatch pairs (underlined). 



sticky ends with 5'-overhangs. We carried out cleavage 
assays with the restriction enzyme and ClU-modified 
DDDs, as well as the native DDD and a DDD with 
T:G mismatch pairs as reference duplexes (Figure 6). As 
expected, T:G and C1U:G mismatches abrogated cleavage. 
Conversely, the native DDD and all three modified 
duplexes with C1U:A pairs were cleaved by EcoRI 
without obvious effects of the C1U modification in terms 
of the time course of cleavage. 



DISCUSSION 

Our investigation of the structure, stability and function 
of DNA duplexes with C1U in place of T was motivated by 
the recently demonstrated genome-wide transliteration of 
T with C1U in E. coli by combining tight metabolic selec- 
tion and long-term automated cultivation of bacterial 
populations (13). Among nucleobases, only thymine is 
unique to DNA and the fact that its metabolism is 
separated from RNA biosynthesis provides an 
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Figure 5. Minor groove widths (solid lines) and depths (dashed lines) 
in ClU-modified DDD duplexes in complex with RNase H compared 
with native DDD d(CGCGAATTCGCG) [PDB ID 436D; (31)] and a 
DDD d(CGCG AATTTGCG ) with T:G mismatch pairs [underlined, 
PDB ID 113D; (32)]. All parameters were calculated with the 
program Curves (27). 
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bracket the central T:A tract. The first three structures 
illustrate that C1U:A and T:A pairs adopt nearly identical 
configurations with two hydrogen bonds (Figures 2, 4A 
and 4C), thus confirming the earlier observations based on 
the structure of a B-form DNA duplex with C1U:A pairs 
analyzed by solution NMR (14). Similarly, the geometry 
of C1U:G mismatch pairs in the structure of the fourth 
complex closely resembles that of the T:G pair in the 
structure of the B-form duplex of the same sequence (32) 
(Figure 4B). Crystals of the RNase H complexes were 
grown at near neutral pH (6.5) and the wobble configur- 
ation with two hydrogen bonds confirms previous reports 
of the similarity of the C1U:G and T:G pairs as analyzed 
by NMR in solution (15). 

The similarities at the level of conformation indicate 
that chlorine appears to closely mimic the methyl substitu- 
ent of thymine. Indeed, when both bond length and van 
der Waals radius are taken into account, the two substitu- 
ents exhibit very similar sterics. The length of the C(5)-C1 
bond is clearly longer than that of the C(5)-CH 3 bond 
(1.73 A versus 1.5 A, respectively), but the van der Waals 
radius o of a methyl group (2 A) exceeds that of chlorine 
(1.75 A) by roughly the same amount. Thus, the two 
substituents will be nearly indistinguishable for a protein 
probing the major groove of a B-form duplex. Indeed, we 
demonstrate here that EcoRI cuts the C1U7-, C1U8- and 
C1U7/8 DDDs with two or more Ts in the recognition site 
replaced by C1U (Figure 6). Conversely, replacing the C:G 
pairs in the target sequence with either T:G or C1U:G 
mismatches abolishes cleavage by EcoRI. Chlorine not 
only mimics the native methyl substituent of thymine, 
thus allowing C1U:A pairs to evade recognition by 
enzymes, but replacement of T opposite A by C1U does 
not lead to any obvious changes in the thermodynamic 
stability of B-form DNA (Table 2). Also, T:G and 
C1U:G mismatch pairs result in very similar losses of sta- 
bility. The stability data support the notion that chlorine 
and methyl at the 5-position of uracil are basically inter- 
changeable. Although chlorine can, in principle, partici- 
pate in halogen bonds (38), the observed distances 
between chlorine and phosphate oxygens in the 
ClU-modified DDD structures are all clearly above the 
sum of the van der Waals radii for CI and O (3.27 A): 
minimum 5.09 A, maximum 6.23 A and average 5.49 A. 
Chlorine atoms are also poorly hydrated and the 
shortest contact between a chlorine and water in our struc- 
tures is 3.71 A and thus not indicative of a hydrogen bond. 
Similar melting temperatures established for the native 
DDD and the ClU-modified DDDs are also inconsistent 
with differential polarizations of the thymine and 
chlorouracil nucleobases as this should affect stacking 
and thus stability. 

The similarity between C1U and T in terms of pairing 
properties, conformation and duplex stability helps 
rationalize the successful outcome of the in vivo evolution 
of E. coli strains that rely on a C1U, A, C, G alphabet 
instead of T, A, C, G (13). It is unlikely that a similar 
success could be achieved with the FU or BrU analogs 
that exhibit more significant steric and stereoelectronic de- 
viations from T compared with C1U. Although we found 
in one instance (EcoRI) that an enzyme appeared to 



ignore the switch from a methyl group to chlorine in the 
major groove, T — > C1U base substitution changes groove 
geometry and electrostatics in a subtle fashion and may 
affect nucleic acid recognition by proteins involved in 
replication and transcription. Indeed, in vitro kinetic 
assays of DNA polymerization using human polymerase 
(5 and E. coli DNA polymerase I exo" Klenow fragment 
with templates containing either C1U or T showed 
facilitated incorporation of dGTP opposite C1U 
compared with T, particularly at increased values of pH 
(i.e. 9.0) (39). A high mutation rate is important for the 
evolutionary process and the massive number of A — >■ G 
mutations seen in the adaptation of E. coli to C1U in place 
of T is consistent with facile formation of the C1U:G 
wobble pair. 
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